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ABSTRACT 

The International Union of Basic and Clinical 
Pharmacology/British Pharmacological Society 
(lUPHAR/BPS) Guide to PHARMACOLOGY (http:// 
www.guidetopharmacology.org) is a new open 
access resource providing pharmacological, 
chemical, genetic, functional and pathophysio- 
logical data on the targets of approved and experi- 
mental drugs. Created under the auspices of the 
lUPHAR and the BPS, the portal provides concise, 
peer-reviewed overviews of the key properties of a 
wide range of established and potential drug 
targets, with in-depth information for a subset of 
important targets. The resource is the result of 
curation and integration of data from the lUPHAR 
Database (lUPHAR-DB) and the published BPS 
'Guide to Receptors and Channels' (GRAC) compen- 
dium. The data are derived from a global network of 
expert contributors, and the information is exten- 
sively linked to relevant databases, including 
ChEMBL, DrugBank, EnsembI, PubChem, UniProt 
and PubMed. Each of the ^6000 small molecule 
and peptide ligands is annotated with manually 
curated 2D chemical structures or amino acid 



sequences, nomenclature and database links. 
Future expansion of the resource will complete the 
coverage of all the targets of currently approved 
drugs and future candidate targets, alongside edu- 
cational resources to guide scientists and students 
in pharmacological principles and techniques. 

INTRODUCTION 

Online resources have become indispensable tools for 
pharmacology and drug discovery, in common with 
other disciplines in the biomedical sciences. Databases 
such as ChEMBL (1) and PubChem (2) provide extensive 
information on the bioactivity and chemical structures of 
approved and experimental drugs and their interaction 
with targets, either manually curated from the medicinal 
chemistry literature (ChEMBL) or uploaded by depositors 
(PubChem). To complement these large-scale resources, 
there is a need for an in-depth, expert-curated overview 
of the key targets and ligands, to foster basic and cUnical 
research and innovative drug discovery, and to educate 
the next generation of researchers. The International 
Union of Basic and CHnical Pharmacology/British Phar- 
macological Society (lUPHAR/BPS) Guide to 
PHARMACOLOGY portal (http://www.guidetopharma 
cology.org) is being developed to assist research in 
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pharmacology, drug discovery and chemical biology in 
academia and industry, by providing: (i) an authoritative 
synopsis of the complete landscape of current and research 
drug targets; (ii) an accurate source of information on the 
basic science underlying drug action; (iii) guidance to re- 
searchers in selecting appropriate compounds for in vitro 
and in vivo experiments, including commercially available 
pharmacological tools for each target; and (iv) an 
integrated educational resource for researchers, students 
and the interested public. 

The Guide to PHARMACOLOGY portal has been 
onHne since December 2011. The current release of the 
database (October 2013) integrates two well-estabhshed 
sources. The first of these is the lUPHAR Database 
[lUPHAR-DB: (3)], which provides in-depth, integrative 
views of the pharmacology, genetics, functions and patho- 
physiology of important target famihes, including G 
protein-coupled receptors (GPCRs), ion channels and 
nuclear hormone receptors (NHRs). The second is the 
BPS 'Guide to Receptors and Channels' [GRAC: (4)], a 
compendium, previously published in print, providing 
concise overviews of the key properties of a wider range 
of targets than those covered in lUPHAR-DB, together 
with their endogenous hgands, experimental drugs, 
radiolabelled hgands and probe compounds, with recom- 
mended reading hsts for newcomers to each field. 

Management and peer review of the new resource is the 
responsibility of the lUPHAR Committee on Receptor 
Nomenclature and Drug Classification (NC-IUPHAR), 
which acts as the scientific advisory and editorial board. 
The organization has an international network of over 700 
expert volunteers organized into ~60 subcommittees 
deahng with individual target famihes. The subcommittee 
members contribute expertize in several ways, including 
identifying the key pharmacological properties of each 
target, along with quantitative activity data from the 
research hterature. NC-IUPHAR also directly supports 
the Guide to PHARMACOLOGY through its work in 
monitoring 'deorphanization' of receptors (i.e. identifying 
new endogenous ligands), revising receptor nomenclature 
in collaboration with HUGO Gene Nomenclature 
Committee (HGNC) database (5-7), haising with 
journals, and developing standards and terminology in 
quantitative pharmacology (8-10). 

The primary sources of data in the Guide to 
PHARMACOLOGY are distinct from the medicinal 
chemistry and natural product literature extracted by 
ChEMBL. Our focus is on data and contextual informa- 
tion relevant to the prechnical phases of drug discovery 
and includes extensive quantitative and chemical informa- 
tion manually curated from the primary research htera- 
ture, predominantly from the leading non-specialist 
scientific journals and widely read speciahst journals 
(Figure 1). 



CONTENT AND DATA CURATION 

The current version of the database includes pharmaco- 
logically relevant data and information on 2485 human 
targets including GPCRs, ion channels, NHRs, catalytic 



(enzyme linked) receptors, transporters and enzymes 
(including all protein kinases) (Table 1). Also included, 
is information on the genetics, emerging pharmacology, 
functions and pathophysiology of 130 orphan GPCRs (7). 

Presently, the resource describes the interactions 
between target proteins and 6064 distinct ligand entities 
(Table 1). Ligands are listed against targets by their action 
(e.g. activator, inhibitor), and also classified according to 
substance types and their status as approved drugs. 
Classes include metabohtes (a general category for aU 
biogenic, non-peptide, organic molecules including 
lipids, hormones and neurotransmitters), synthetic 
organic chemicals (e.g. small molecule drugs), natural 
products, mammalian endogenous peptides, synthetic 
and other peptides including toxins from non-mammahan 
organisms, antibodies, inorganic substances and other, 
not readily classifiable compounds. 

The new database was constructed by integrating data 
from lUPHAR-DB (3) and the published GRAC compen- 
dium (4). An overview of the curation process is depicted 
as an organizational flow chart in Figure 2. New informa- 
tion was added to the existing relational database behind 
lUPHAR-DB and new webpages were created to display 
the integrated information. For each new target, informa- 
tion on human, mouse and rat genes and proteins, 
including gene symbol, full name, location, gene ID, 
UniProt and Ensembl IDs was manually curated from 
HGNC (5), the Mouse Genome Database (MGD) at 
Mouse Genome Informatics (MGI) (11), the Rat 
Genome Database (RGD) (12), UniProt (13) and 
Ensembl (14), respectively. In addition, 'Other names', 
target-specific fields such as 'Principal transduction', text 
from the 'Overview' and 'Comments' sections and refer- 
ence citations (downloaded from PubMed; http://www. 
ncbi.nlm.nih.gov/pubmed) were captured from GRAC 
and uploaded into the database against a unique Object 
ID. For targets present in both lUPHAR-DB and GRAC, 
entries were cross-checked and merged. A representative 
target family page is shown in Figure 3. 

For the integration exercise, aU ligands hsted in GRAC 
were first checked against lUPHAR-DB using name-, 
synonym- and structure-based comparisons. For over 
1000 hgands, there was an existing lUPHAR-DB entry 
that matched. The remaining new hgands (~1900) were 
curated using the workflow already estabhshed for the 
population of lUPHAR-DB with ligand structures (15). 
An overview of the process is outlined below. 

Interrogation of multiple databases and direct hterature 
checks captured the correct structural information, nomen- 
clature and target mapping for each ligand. All small mol- 
ecules were resolved against a PubChem Compound 
Identifier (CID) as a primary molecular identifier and rep- 
resentative chemical structure (2). Each hgand was then 
uploaded into the resource with a unique ID. The quanti- 
tative phaimacological activity data of each hgand was 
captured from GRAC and uploaded. 

Ligands have individual pages (Figure 3) providing 2D 
chemical structures or peptide sequences, calculated 
physico-chemical properties, classification and approval 
status for human clinical use, the International Union of 
Pure and Applied Chemistry (lUPAC) name and other 
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Figure 1. Breakdown of scientific journals cited in the resource. The chart shows the top 20 most cited journals in the resource, and the contribution 
of each journal as a percentage of the total. 



names used as synonyms. International Nonproprietary 
Names (INNs) are also currently provided for 730 com- 
pounds. INNs are the official non-proprietary or generic 
names given to pharmaceutical substances, as designated 
by the World Health Organization (WHO; http://www. 
who.int/medicines/services/inn/en/). For small molecules, 
simplified molecular input line entry specification 
(SMILES), the lUPAC International Chemical Identifiers 
(InChl string and InChlKey) and Chemical Abstracts 
Service (CAS) registry numbers (http://www.cas.org/ 
index.html) are provided. Peptides are specified by one- 
and three-letter amino acid sequences, any post-transla- 
tional modifications and details of their protein precursors. 
Links are provided to corresponding entries in relevant 
bioactivity and chemistry resources including BindingDB 
(16), Chemical Entities of Biological Interest (ChEBI) (17), 
ChEMBL (1), ChemSpider (18), DrugBank (19), Human 
Metabolome Database (HMDB) (20), PharniGKB (21), 
RCSB Protein Data Bank (22), UniProt (13) and ZINC 
(23). Ligand pages also display a hst of structurally 
similar ligands and a summary of aU biological activity 
data for each compound across all the targets. 

The hgand page includes an option to display the results 
for InChlKey searching in Google, the utihty of which has 
recently been described (24). While the entire Key is used 
for exact-match searches of ChemSpider, the Google 
search uses just the inner 'layer' of 14 characters 



approximating to the basic molecular connectivity. It 
will thus retrieve all related entries with isomeric differ- 
ences encoded in the outer layer of the Key. The results, 
typically returned in <0.5s with very high specificity, are 
the matches from over 50 milhon InChlKeys cached by 
Google from a wide range of databases and web resources. 



IMPLEMENTATION 

The data are held in a PostgreSQL relational database 
(http://www.postgresql.org), with the exception of hgand 
structures and physico-chemical properties, which are 
stored in an Oracle database (Oracle Corporation, 
Redwood Shores, CA, USA). Curators use custom-built 
Java (Oracle Corporation, Redwood Shores, CA, USA) 
software to enter and edit data. The pubhc web interface is 
implemented using HTML, CSS and JavaScript compo- 
nents generated dynamically on the server side by Java 
servlets and Java Server Pages. The web application runs 
in the Apache Tomcat servlet container (http://tomcat. 
apache.org/) on a Linux platform. Ligand structure- 
based searching is implemented with the Pinpoint 
chemical cartridge (Dotmatics Limited, Bishops 
Stortford, UK) and chemical structure editing capability 
is provided by the MarvinSketch chemical editor 
(ChemAxon Limited, Budapest, Hungary). Ligand 
chemical structure formats and identifiers were generated 
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Table 1. Database statistics 



Target class Number of targets 





400 


GPCRs including orpiians 


394 


Orphan GPCRs 


130 


Other 7TM proteins 


6 


Nuclear hormone receptors 


48 


Catalytic receptors 


223 


Ligand-gated ion channels 


84 


Voltage-gated ion channels 


142 


Other ion channels 


49 


Enzymes 


1008 


Transporters 


503 


Other protein targets 


28 


Total number of targets 


2485 


CVipmiffll pla*;*; 




i^yntlietic organics 




Metabohtes 


550 


Endogenous peptides 


687 


Other peptides including synthetic peptides 


1089 


Natural products 


161 


Antibodies 


10 


Inorganics 


55 


Others 


8 


Approved drugs 


559 


Withdrawn drugs 


11 


Drugs with INNs 


857 


Radioactive ligands 


550 


Total number of ligands 


6064 


Number of synonyms 


51189 


Number of binding constants 


41076 


Number of references 


21774 



using the Open Babel software (25). lUPAC names were 
generated using JChem for Excel (ChemAxon Limited, 
Budapest, Hungary) and physico-chemical properties 
were generated using the Chemistry Development Kit 
(26). Ligand images were created using the NCI/CADD 
Chemical Identifier Resolver from the National Cancer 
Institute (http://cactus.nci.nih.gov/chemical/structure). 
Small molecule hgands with similar structures were clus- 
tered using Pipehne Pilot (Accelrys, San Diego, CA, USA) 
and peptides with similar sequences were clustered using 
h-cd-hit, part of the CD-HIT Suite (27). 

WEB INTERFACE 

Users can access 'Target' and 'Ligand' lists and search 
tools directly from the portal homepage, as well as from 
the navigation bar at the top of every subsequent 
webpage. Each class of target (e.g. transporters, 
enzymes) is Hsted according to protein family (e.g. ATP- 
binding cassette family, amino acid hydroxylases). The 
portal is designed to provide users with access to two 
views of pharmacologically relevant data on the targets 
in the database. The organization and content of these 
two complementary views is described below: 

(1) Users are initially presented with concise, searchable 
overviews of the properties of each family of targets. 
Data on all members of a target family, or subfam- 
ily, are presented on a single webpage (Figure 3). The 



page for each target family includes a brief overview 
of the properties of the target group. Details are 
provided on approved nomenclature (where applic- 
able, approved by NC-IUPHAR) and synonyms, 
human, mouse and rat gene names and links to the 
HGNC, MGD, RGD, Ensembl and UniProt data- 
bases. Quantitative data are provided on recom- 
mended ligands classified by their mode of action 
(e.g. agonists, antagonists, substrates, inhibitors and 
radiolabelled ligands) and other information specific 
to the class of target (e.g. the signal transduction 
mechanisms used by GPCRs, or the biophysical 
properties of ion channels). Overall, the data focus 
on human proteins and include only key pharmaco- 
logical agents, chosen because they are hkely to be 
the most useful in the laboratory (i.e. they are select- 
ive and available by donation, or from commercial 
sources). A list of review articles recommended as 
further reading, key references and additional com- 
mentary (highlighting, for example, where species dif- 
ferences, or hgand metaboUsm, are potential 
confounding factors) are also provided. These pages 
are designed to serve as an introduction to a family 
of targets and are a useful entry point into the ht- 
erature for newcomers to a particular field. 
(2) From the family overview pages, users can then navi- 
gate {via the 'More detailed page' hnks, see Figure 3) 
to database pages with more in-depth information 
for a subset of important targets, providing 
expanded views of the pharmacology, genetics, func- 
tions and pathophysiology. These include a longer 
introduction to the family and separate pages 
providing a comprehensive description of each 
target and its function, with information on protein 
structure, hgand interactions, signalHng mechanisms, 
tissue distribution, functional assays and biologically 
important variants (e.g. single nucleotide polymorph- 
isms and splice variants). Reported hgand inter- 
actions may include endogenous ligands, current 
and historical licensed and experimental drugs, and 
available radiolabelled hgands, along with informa- 
tion on their actions (e.g. agonist, allosteric modula- 
tor, inhibitor) and quantitative data, where possible 
from multiple hterature sources. Comparative data 
for mouse and rat species are also listed. In 
addition, the phenotypes resulting from altered gene 
expression (e.g. in genetically altered animals or in 
human genetic disorders) are described. An extensive 
set of links is provided to other resources including 
protein, gene, structure, disease and drug target data- 
bases. Family-specific information and database hnks 
are also provided, such as Enzyme Commission (EC) 
numbers and links to the KEGG BRITE hierarchy 
describing enzymatic reactions (28). For further 
details on the types of information that are 
provided in the detailed view see previous publica- 
tions (3,15,29). 

All literature citations in both views are linked to 
PubMed, and all hgand entries are linked to individual 
hgand pages providing additional information (as 
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Figure 2. The Guide to PHARMACOLOGY curation process and organizational chart. 



described in the section on 'CONTENT AND DATA 
CURATION' above). 

The interface includes a simple search box where users 
can enter keywords such as ligand or target names, and 
advanced search tools which allow searches by specific 



database field, database identifier (e.g. Ensembl ID), 
chemical identifier (e.g. standard InChlKey, CAS 
registry number) or PubMed identifier. Chemical structure 
searches can also be performed by providing a structure in 
SMILES format, or drawing a chemical structure using 
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Figure 3. Screenshot of the Cannabinoid receptor family page in the Guide to PHARMACOLOGY, with overlaying screenshots of a typical ligand 
page and reference page with link-out to PubMed. Also shown is a link to the 'More detailed page" of the CBi receptor with a screenshot of the top 
section of the target page showing the 'Contents' table listing the types of information available for this target. 
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the structure editor. The search tool can perform exact 
match, substructure, similarity and SMARTS-pattern 
searches (http://www.dayHght.com/dayhtml/doc/theory/ 
theory.smarts.html). The chemical structure editor is also 
accessible from ligand pages; clicking on the hgand image 
loads the structure into the editor where it can be modified 
and used to search the database. Search results indicate 
which database fields matched the query term, and hnks 
are provided to the relevant database entries. 

Extensive help pages and a tutorial on how to use the 
resource are also provided. The help page can be accessed 
via hnked icons within database fields as well as from the 
navigation menu and home page. The help page includes 
definitions of terms used to describe the data displayed on 
the site, in addition to providing a detailed guide to using 
the various search functions. 



COMPARISON WITH OTHER RESOURCES 

There are other databases that have a degree of conceptual 
and content overlap with the Guide to PHA- 
RMACOLOGY, some of which are included in this 
issue. Of these, ChEMBL, DrugBank and Therapeutic 
Target Database (TTD) (30) are the closest. However, 
the Guide to PHARMACOLOGY differs from these re- 
sources in a number of important ways. Firstly, we restrict 
the range of protein targets and ligands to those most 
relevant to therapeutics and drug discovery, chosen with 
the exercise of curatorial judgement and backed by our 
network of experts, with a focus on the quality and 
depth of annotation. Secondly, this is subject to review 
and quality control, not only by our international expert 
committee members operating as a de facto network of 
'super-curators', but also via user feedback. Thirdly, we 
curate activity data for research compounds from primary 
literature sources, including posters and patents, rather 
than from review articles, with a focus on the interactions 
of each compound with its data-supported primary target 
(e.g. Angiotensin-converting enzyme (ACE) for captopril). 
Fourthly, the data can be annotated with free-text 
comments that would otherwise not easily fit into 
database schema. These include information on alterna- 
tive isomers and salt forms. An example here are the eight 
approved drug-prodrug pairs for ACE inhibitors that 
present a particular curatorial challenge (e.g. see 
http://www.guidetopharmacology.org/GRAC/LigandDis- 
playForward?ligandId = 6352). These 16 structures are not 
both explicitly linked and activity-mapped in other 
databases. 

Another example that illustrates the differences between 
the three databases is atorvastatin. In the Guide to 
PHARMACOLOGY (http://www.guidetopharmacology . 
org/GRAC/LigandDisplayForward?tab = biology &ligan- 
dld = 2949), there are three activity mappings between this 
hgand and the primary drug target hydrox- 
ymethylglutaryl-CoA reductase (HMGCR) with both a 
Ki (14 nM) and an IC50 for human (8nM), together with 
an IC50 for rat (1.16nM). The equivalent DrugBank entry 
(DB01076) is mapped to 3 targets, 1 1 enzymes and 9 trans- 
porters, but these include associations from the literature 



that are not all supported by directly measured molecular 
interactions. The ChEMBL entry (CHEMBL1487) is 
assay-mapped to 117 proteins and lists 217 IC50 values, 
including proteins in the DRUGMATRIX screen and 
some antimalarial parasite results. There are four IC50 
values for the rat and three for the human enzyme. In com- 
parison, the two literature references for atorvastatin in 
TTD are not the same as from the other three sources. 
Mapping differences between ChEMBL, DrugBank and 
TTD have previously been explored in detail (24,31), but 
the overall picture between these and the Guide to 
PHARMACOLOGY is one of complementarity. We thus 
suggest that pharmacologically oriented users might find 
the curatorially selected set of stringent activity mappings 
in the Guide to PHARMACOLOGY a simpler entry point 
(indeed we designed it with this in mind) but we provide 
extensive linking to the other high-value resources. 

SUMMARY AND FUTURE DIRECTIONS 

Our goal is to complete a stringently curated direct 
mapping (where the primary literature data permits) 
between chemical structures and their primary molecular 
targets, initially for targets of approved drugs, but extend- 
ing this to chnical and research targets. Published listings 
and the exact definitions for these categories vary widely, 
but indicate a range of ~200-300 for the former and 
~500-1000 for the latter (32-36). Possible reasons for 
disparities in these numbers are indicated in database 
comparison reports (24,31). We are also in the process 
of updating our hgand structure submissions to 
PubChem, facihtating UniProt cross references for their 
targets and reviewing new information sources for 
possible inclusion. 

The creation of the new portal reflects our intention to 
develop the resource into a comprehensive onhne guide, 
which will include educational resources, and to produce a 
'Concise Guide to PHARMACOLOGY', to be pubhshed 
in PDF format at two yearly intervals, as a supplement to 
the British Journal of Pharmacology. The 'Concise Guide 
to PHARMACOLOGY', which replaces GRAC, will be a 
biennial snapshot of succinct overviews of the properties 
of each target family, intended to be a quick desktop ref- 
erence guide. AdditionaUy, this wiU provide a permanent 
record (DOL digital object identifier) that will survive 
database updates and therefore allow the precise context 
of the database to be understood at any time in the future 
(37). 

Since the Guide to PHARMACOLOGY portal now in- 
tegrates data from the printed GRAC compendium and 
lUPHAR-DB, we are planning a phased retirement of 
lUPHAR-DB. The current URL (http://www.iuphar-db. 
org) will remain active, with appropriate notices directing 
users to the Guide to PHARMACOLOGY portal. 

DATA ACCESS 

The Guide to PHARMACOLOGY is available online at 
http://www.guidetopharmacology.org. The website 
includes downloadable files containing current receptor 
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and channel lists, NC-IUPHAR nomenclature, synonyms, 
genetic information, HGNC gene nomenclature and iden- 
tifiers, and other database accessions. Other file formats 
are available by emaihng enquiries@guidetopharma- 
cology.org. Information on finking to Guide to 
PHARMACOLOGY pages is provided at fittp://www. 
guidetopharmacology.org/finking.jsp. To further facilitate 
external programmatic and user access to the database, we 
are developing an application programming interface 
(API) and Web services. This will allow our content to 
be exploited in new integration initiatives such as Open 
PHACTS (38), of which we are already an associate 
member. The database is licensed under the Open Data 
Commons Open Database License (ODbL) (http:// 
opendatacommons.org/licenses/odbl/), and its contents 
are licensed under the Creative Commons Attribution- 
ShareAfike 3.0 Unported license (http://creativecommons 
.org/licenses/by-sa/3 .0/). 

CITING THE RESOURCE 

For a general citation of the resource we recommend 
citing this article. Citation formats for specific target 
pages are provided on tfie website. 
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